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Abstract — We study the design of media streaming applica- 
tions in the presence of multiple heterogeneous wireless access 
methods with different throughputs and costs. Our objective is 
to analytically characterize the trade-off between the usage cost 
and the Quality of user Experience (QoE), which is represented 
by the probability of interruption in media playback and the 
initial waiting time. We model each access network as a server 
that provides packets to the user according to a Poisson process 
with a certain rate and cost. Blocks are coded using random 
linear codes to alleviate the duplicate packet reception problem. 
Users must take decisions on how many packets to buffer before 
playout, and which networks to access during playout. 

We design, analyze and compare several control policies with 
a threshold structure. We formulate the problem of finding the 
optimal control policy as an MDP with a probabilistic constraint. 
We present the HJB equation for this problem by expanding the 
state space, and exploit it as a verification method for optimality 
of the proposed control law. 



I. Introduction 

Media streaming is fast becoming the dominant application 
on the Internet Q~). The popularity of such media transfers 
has been accompanied by the growing usage of wireless 
handheld devices as the preferred means of media access. 
It is expected that such media streaming would happen in 
both a device to device (D2D) as well as in a base-station 
to device fashion, and both the hardware and applications 
needed for such communication schemes are already making 
an appearance 0, 0. 

Media streaming is achieved by dividing a file into blocks, 
which are then further divided into packets for transmission. 
After each complete block is received, it can be decoded and 
played out. Since we consider a streaming application, blocks 
inherently have a sequence associated with them, and each 
block must be received by the time the previous one has been 
played out. The absence of a block at the time of playout 
would cause a frame freeze, which is to be avoided if possible. 
When there are multiple networks that can be used to access 
a particular piece of content (e.g. from a base station or a 
peer device) each device must take decisions on associating 
with one or more such access networks. However, the costs 
of different access methods might be different. For example, 
accessing the base station of a cellular network can result in 
additional charges per packet, while it might be possible to 
receive the same packets from the access point of a local 
WLAN or another device with a lower cost or possibly for 
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free. Further, the cost of communication might be mitigated 
by the initial amount of buffering before playout. Hence, there 
are trade-offs between the probability of frame skipping, the 
initial waiting time, and the cost of different access methods 
available. 

The objective of this paper is to understand the trade-off 
between initial buffering, and the usage of low and costly 
communication methods for attaining a target probability of 
skip-free playout. We consider a system wherein network 
coding is used to ensure that packet identities can be ignored, 
and packets may potentially be obtained from two sources 
(servers) that have different rates of transmission. The wireless 
channel is unreliable, and we assume that each server can 
deliver packets according to a Poisson process with a known 
rate. Further, the costs of accessing the two servers are 
different; for simplicity we assume that one of the servers is 
free. Thus, our goal is to develop an algorithm that switches 
between the free and the costly servers in order to attain a 
target probability of skipping at lowest cost. 

Our contributions are as follows. We first develop an 
analytical characterization of the interruption probability for 
the single server case. Using this result, we obtain a lower 
bound on the cost of offline policies that do not observe the 
trajectory of packets received. We show that such policies have 
a threshold form in terms of the time of association with the 
costly server. Using the offline algorithm as a starting point, 
we develop an online algorithm with lower cost that has a 
threshold form - both free and costly servers are used until the 
queue length reaches a threshold, followed by only free server 
usage. We then develop an online algorithm in which the risk 
of interruption is spread out across the trajectory. Here, only 
the free server is used whenever the queue length is above a 
certain threshold, while both servers are used when the queue 
length is below the threshold. The threshold is designed as a 
function of the initial buffer size and the desired interruption 
probability. 

We formulate the problem of finding the optimal network 
association policy as a Markov Decision Process with a proba- 
bilistic constraint. Similarly to the Bellman equation proposed 
by Chen H for a discrete time MDP with probabilistic 
constraints, we write the Hamilton-Jacobi-Bellman equation 
for the problem. Using a guess and check approach, we derive 
an approximate solution of the HJB equation, and show that 
the optimal policy given by the approximate value function 
takes a threshold form. 

Media streaming, particularly in the area of P2P networks 
has attracted significant recent interest. For example, work 
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such as (5), (6), Q develop analytical models on the trade-off 
between the steady state probability of missing a block and 
buffer size under different block selection policies. Unlike our 
model, they consider live streaming with deterministic chan- 
nels. The use of random linear codes considerably simplifies 
packet selection [8], |9), iflOl . ifTTl . and we can use the same 
idea to ensure that packets can be received from multiple 
sources without the need to coordinate the exact identities of 
the packets from each. However, we focus on content that is 
already cached at multiple locations, and must be streamed 
over one or more unreliable channels. Related to our work 
is lfl2ll . which considers two possible wireless access meth- 
ods (WiFi and UMTS) for file delivery, assuming particular 
throughput models for each access method. In contrast to this 
work, packet arrivals are stochastic in our model, and our 
streaming application requires hard constraints on quality of 
user experience. 

II. System Model and QoE Metrics 

We consider a media streaming system as follows. A single 
user is receiving a media file of size F from various servers it 
is connected to. Each server could be a wireless access point 
or another wireless user operating as a server. The receiver 
first buffers D packets from the beginning of the file, and 
then starts the playback. 

We assume that time is continuous, and the arrival process of 
packets from each server is a Poisson process independent of 
other arrival processes. Further, we assume that no redundant 
packet is delivered from different servers. This assumption can 
be justified if there is no delay in the feedback to the servers, 
or by sending random linear combination of the packets in the 
server (see lfl3l and lfT4l for more details). Therefore, we can 
combine the arrival processes of any subset S of the servers 
into one Poisson process of rate R$ equal to the summation 
of the rates from the corresponding servers. 

There are two types of servers in the system: free servers 
and the costly ones. There is no cost associated with receiving 
packets from a free server, but a unit cost is incurred for each 
(coded) packet delivered by any costly server. As described 
above, we can combine all the free servers into one free server 
from which packets arrive according to a Poisson process of 
rate Rq. Similarly, we can merge all of the costly servers into 
one costly server with effective rate of R c . At any time t, the 
user has the option to use packets only from the free server 
or from both the free and the costly servers. In the latter case, 
the packets arrive according to a Poisson process of rate i?i = 
Ro + R c . The user's action at time t is denoted by Ut € {0, 1}, 
where Ut = if only the free server is used at time t, while 
u t = 1 if both free and costly servers are used. We normalize 
the playback rate to one, i.e., it takes one unit of time to play 
a single packet. We also assume that the parameters Rq and 
Ri are known at the receiver. 

The dynamics of the receiver's buffer size (queue-length) 
Xt can be described as follows 



where D is the initial buffer size, N t Poisson processes of 
rate Ro and iV t c is a Poisson counter of rate R c which is 
independent of the process Nt- The last term correspond to 
the unit rate of media playback. 

The user's association (control) policy is formally defined 
below. 

Definition 1. [Control Policy] Let h t = {x s : < s < 

t} U {u s : < s < t} denote the history of the buffer sizes 
and actions up to time t, and H be the set of all histories for 
all t. A deterministic association policy denoted by tt is a 
mapping ir : H i — > {0, 1}, where at any time t 



n(h t ) 



0, if only the free server is chosen, 

1 , if both servers are chosen. 



Denote by II the set of all such control policies. 

We can declare an interruption in playback when the buffer 
size decreases to zero before reaching the end of the file, i.e., 
when there is no packet at the receiver to be played but the 
file is not completely downloaded. More precisely, let 



r e = inf{i : x t < 0}, 17 = inf{£ : x t > F — t}, 



(2) 



where 77 corresponds to time of completing the file download, 
because we have already played 77 packets and the buffer 
contains the remaining F — Tf packets to be played. The video 
streaming is interrupted if and only if r e < Tf, 

We consider the following metrics to quantify Quality of 
user Experience (QoE). The first metric is the initial waiting 
time before the playback starts. This is directly captured by 
the initial buffer size D. Another metric that affects QoE is the 
probability of interruption during the playback for a particular 
control policy 7r denoted by 

p*(Z>) = P{r e < 77}, (3) 

where r e and Tf are defined in 

Definition 2. The policy ir is defined to be (D, e)-feasible if 
(D) < e. The set of all such feasible policies is denoted by 

U(D,e). 

The third metric that we consider in this work is the 
expected cost of using the costly server which is proportional 
to the expected usage time of the costly server. For any (D, e), 
the usage cost of a (D, e)-feasible policy tt is given by^| 



J*(D, e) = E 



Utdt 



(4) 



Xi 



D + N t 



u T dN c T - 1, 



(!) 



The value function or optimal cost function V is defined as 

V(D,e)= min J*(D,e), (5) 
iren(7J,c) 

and the optimal policy tt* is defined as the optimal solution 
of the minimization problem in (0. 

In our model, the user expects to have an interruption- 
free experience with probability higher than a desired level 
1 — e. Note that there is a fundamental trade-off between the 
interruption probability e, the initial buffer size D, and the 

1 Throughout this work, we use the convention that the cost of an infeasible 
policy is infinite. 
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usage cost. These trade-offs depend on the association policy 
as well as the system parameters Rq, R c and F, 

We first characterize the trade-offs between the QoE metrics 
for degenerate control policies. Next, we use these results to 
design association policies. 

III. QoE Trade-offs for the Single-Server 
Problem 

Consider a single-server problem where the receiver re- 
ceives the packets according to a Poisson process of rate 
R. The user's only decision in this case is the initial buffer 
size D. We would like to characterize the optimal trade-off 
between the initial buffer size and the interruption probability 
p(D) by providing bounds on the interruption probability as a 
function of the system parameters R and F. An upper bound 
(achievability) on p{D) is particularly useful, since it provides 
a sufficient condition for desirable user experience. A lower 
bound (converse) on p(D) demonstrates how tight the upper 
bound is. 

Theorem 1. For the initial buffer size D, let p(D) be the 
interruption probability of a single-server system defined as 
in $3). Define j(r) as 

7 (r) = r + E(e- r -l), (6) 

and r(R) as the largest root of ~f(r), i.e., 

f(R) = sup{r : 7 (r) = 0}. (7) 

Then for all R > 1, 

e -f(R)D _ 2e ~^F < p{D) < e -f{R)D_ (g) 

Proof: We do not include the proof owing to space 
limitations. See Ifl4l for a complete proof. ■ 
Note that the upper bounds and lower bounds of p{D) 
given by Theorem [T] are asymptotically tight as F goes to 
infinity. Therefore, for F = oo, by continuity of the probability 
measure we get 



p{D) 



min xt < so 
*>o 1 



D 



-f{R)D 



(9) 



Using this characterization, we can identify the ranges of 
the QoE metrics for which there exists no feasible policy or 
the costly server is not required. 

Corollary 1. (a) For any [D 1 e) such that D > f ^ a ^ 

min J n (D,e) = 0. 

(b) For any (D, e) such that D < ^irA 1°S ( \ )> 



log (l). 



mmJ*(D,e) 
jren 



oo. 



Proof: Consider the degenerate policy ttq = 0. This policy 
is equivalent to a single-server system with arrival rate R = 
Rq. By Definition |2] and ©, the policy ttq is (D, e)-feasible 
log (-). Note that by this policy does 



for all D > 



f(Ro) 



not incur any cost, which results in part (a). 

Moreover, for all (D,e) with D < log ( 

no (D, e)-feasible policy. This is so since the buffer size under 



there is 



any policy tt is stochastically dominated by the one governed 
by the degenerate policy tt 1 = 1. Hence, 

P*(D) >P^(D) = exp(-f(i? 1 )L») > e. 

Using the convention of infinite cost for infeasible policies, 
we obtain the result in part (b). ■ 

For simplicity of notation, let ctQ = r(Ro), and «i = 
Throughout the rest of this paper, we study the case that the 
file size F is infinite, since the control policies in this case take 
simpler forms and the cost of such control policies provide an 
upper bound for the finite file size case. Further, by Corollary 
[T]we focus on the region 

K = \(D,e):— log (!)<£><— log (-)} (10) 

to analyze the expected cost of various classes of control 
policies. 



IV. Design and Analysis of Association Policies 

In this section, we propose several classes of parameter- 
ized control policies. We first characterize the range of the 
parameters for which the association policy is feasible for a 
given initial buffer size D and the desired level of interruption 
probability e. Then, we try to choose the parameters such that 
the expected cost of the policy is minimized. 

A. Off-line Policy 

Consider the class of policies where the decisions are 
made off-line before starting media streaming. In this case, 
the arrival process is not observable by the decision maker. 
Therefore, the user's decision space reduces to the set of 
deterministic functions u : M — > {0,1}, that maps time into 
the action space. 

Theorem 2. Let the cost of a control policy be defined as 
in (@. In order to find a minimum-cost off-line policy, it is 
sufficient to consider policies of the following form: 



7r(/i t 



u t 



1, ift<t s 
0, ift>t s 



(11) 



Proof: In general any off-line policy tt consists of mul- 
tiple intervals in which the costly server is used. Consider an 
alternative policy tt' of the form of ( fTTT i where t s = J 7 * . By 
definition of the cost function in the two policies incur the 
same cost. Moreover, the buffer size process under policy tt is 
stochastically dominated by the one under policy tt 1 , because 
the policy tt' counts the arrivals from the costly server earlier, 
and the arrival process is stationary. Hence, the interruption 
probability of tt' is not larger than that of tt. Therefore, for 
any off-line policy, there exists another off-line policy of the 
form given by (fTTT i. ■ 

Theorem 3. Consider the class of off-lines policies of the 
form ( I771) . For any (D, e) G 1Z, the policy tt defined in ([77]) is 
feasible if 



> t* 



R-i — Rn 



1 



log 



1 



D 



(12) 
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Proof: By Definition [2] we need to show that p K (D) < e. 
By a union bound on the interruption probability, it is sufficient 
to verify 



P min x t < Lt = D ) + P minxt < \x =D) < e. 

\o<t<t B 1 / \t>t a 1 / 

(13) 

In the interval [0, t a ], Xt behaves as in a single-server system 
with rate R\. Hence, by Theorem Q] we get 



P| min Xt < Olzn 

, o<t<t s 



DJ < e,- aiD . 
For the second term in (Qjji, we have 



(14) 



P(minx t < 0|x = I?) 

oo 

^ P^minzt < 0\x ts = g)P(x ts = q) 



q=D-t s 



(«) 



q=D-t s 



J2 e - ao{D+k - ts) P(N ts +N[' s = k) 



fe=0 



W ^ „- an (D+k-t<)e- Rlts (Rits) k 



k=0 



-t a )+R 1 t s (e- a °-l) 



k=Q 



exp 



^ a (D - t s ) + RM^" -1)) 



1-1 



^expf-aoOD-^+i?^-^) 

< e - e 



where (a) follows from Theorem Q] and the fact that u t = 0, 
for t > t s . (b) is true because N ts + N{r is a Poisson random 
variable with mean Rit s . (c) holds since ao is the root of 
7(r) defined in (O for R ~ Rq. Finally, (d) follows from the 
hypothesis of the theorem. 

By combining the above bounds, we may verify (foi l which 
in turns proves feasibility of the proposed control policy. ■ 

Note that obtaining the optimal off-line policy is equivalent 
to finding the smallest t s for which the policy is still feasible. 
Therefore, t* given in (fT2l) provides an upper bound on the 
minimum cost of an off-line policy. Observe that t* is almost 
linear in D for all (D,e) that is not too close to the lower 
boundary of region 1Z. As (D, e) gets closer to the boundary, t* s 
and the expected cost grows to infinity, which is in agreement 
with Corollary Q] In this work we pick t* as a benchmark for 
comparison to other policies that we present next. 

B. Online Safe Policy 

Let us now consider the class of online policies where the 
decision maker can observe the buffer size history. Inspired by 
the structure of the optimal off-line policies, we first focus on 
a safe control policy in which in order to avoid interruptions, 
the costly server is used at the beginning until the buffer size 



reaches a certain threshold after which the costly server is 
never used. This policy is formally defined below. 

Definition 3. The online safe policy it parameterized by the 
threshold value S is given by 



TT S (h t ) 



if t < t s 
if t > t s , 



(15) 



where t$ — mi{t > : x t > S}. 

Theorem 4. Let it be the safe policy defined in Definition 
|5] For any [D, e) G 1Z, the safe policy is feasible if 



S > S* = — log 
a 



1 



Moreover, 



min r (D, e) = J 71 

s>s* 



(D,e) 



i?i - 1 



— log (- 



o—aiD 



1 



(16) 



where £ e [0, 1). 

Proof: Similar to the proof of Theorem [3] we need to 
show that the total probability of interruption before and after 
crossing the threshold S is bounded from above by e. Observe 
that for any realization of t$ the bound in (fl4l i still holds. 
Further, since the costly server is not used after crossing the 
threshold and x TS > S, Theorem Q] implies 



P( minx* < 0\x = D ) < e~ aaS < e 

- t>T S ' / 



,-aiD 



(17) 



where the second inequality follows from ([Tol l. Finally, com- 
bining O and (fTTI i gives p n (D) < e, which is the desired 
feasibility result. 

For the second part, first observe that J T (D, e) = E[rs]. 
In order to cross a threshold S > S*, the threshold S* 
must be crossed earlier, because xq = D < S*. Hence, ts 
stochastically dominates t s , implying 

J* S (D,e) = E[r s ] > E[r s «] = J* 3 " (D,e), for all S > S* . 

It only remains to compute E[rg*]. It follows from Wald's 
identity or Doob's optional stopping theorem |fT31 that 



D+{R 1 - l)E[r s .] = E[x Ts ,] - S* + £, 



(18) 



where £ G [0, 1) because the jumps of a Poisson process 
are of units size, and hence the overshoot size when crossing 
a threshold is bounded by one, i.e., S* < x Tqt < S* + 1. 
Rearranging the terms in ( fT8l and plugging the value of S* 
from ([Tol l immediately gives the result. ■ 
Let us now compare the online safe policy ir s with the 
off-line policy defined in ( fTTT i with parameter t* as in (fl2l) . 
We observe that the cost of the online safe policy is almost 
proportional to that of the off-line policy, where the cost ratio 
of the off-line policy to that of the online safe policy is given 
by 



Ro(Ri - 1) 



= 1 + 



Ri(Ro-l) 



> 1. 



i?l — Rq Ri — Rq 

Note that the structure of both policies is the same, i.e, both 
policies use the costly server for a certain period of time 
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and then switch back to the free server. As suggested here, 
the advantage of observing the buffer size allows the online 
policies to avoid excessive use of the costly server when there 
are sufficiently large number of arrivals from the free server. 
In the following, we present another class of online policies. 

C. Online Risky Policy 

In this part, we study a class of online policies where 
the costly server is used only if the buffer size is below a 
certain threshold. We call such policies "risky" as the risk of 
interruption is spread out across the whole trajectory unlike the 
"safe" policies. Further, we constrain risky policies to possess 
the property that the action at a particular time should only 
depend on the buffer size at that time, i.e., such policies are 
stationary Markov with respect to buffer size as the state of 
the system. The risky policy is formally defined below. 

Definition 4. The online risky policy tt t parameterized by the 
threshold value T is given by 



1, if < x t < T 
0, otherwise. 



(19) 



Lemma 1. Let x t be the buffer size of a single-server system 
with arrival rate R > 1. Let the initial buffer size be D and 
for any T > D > define the following stopping times 



t t = inf{f > : x t > T}, r e = inf{t > : x t < 0}. 



Then 



P(r e > t t ) 



-¥{R)D 



1 - E[e 



-f(R)x 



T T\T e > TT\ 



(20) 



(21) 



where r(R) is defined in ([7|). 

Proof: Let Y(t) = e~ f{R)xt . We may verify that Y(t) is a 
martingale and uniformly integrable. Also, define the stopping 
time t = min{TT,T e }. Since R > 1, we have P(r > t) < 
P(0 < x t < T) — > 0, as t — > oo. Hence, r < oo almost surely. 
Therefore, we can employ Doob's optional stopping theorem 
lfT31 to write 

e -f(R)D = E [y(o)] = E[F(t)] 
= P(r e < tt) ■ 1 

+P(r e >T T )E[e- f ^-|r e > r T ]. 

The claim immediately follows from the above relation after 
rearranging the terms. 

■ 

Theorem 5. Let n T be the risky policy defined in Definition^ 
For any (D, e) £ TZ, the policy ir T is feasible if the threshold 
T satisfies 



where (3 



-L-[log(§) -a D], ifD>D, 



<m( 



B^andD = ±\og{£). 



(22) 



Proof: Let us first characterize the interruption probability 
of the policy ii F when the initial buffer size is D = T. In this 



case, by definition of ir T the behavior of x t is initially the same 
as a single-server system with rate Ri until the threshold T 
is crossed. Hence, by Lemma Q] we have 

p* T (T) =p(mina; t < 0\x = T^j 
= P(r e < r T ) • 1 

+P (tt < r e )P I min xt < OWt < r e , xo = T) 

_ e- aiT - E[e~ aiX -T \ Te > t t ] 
1 -I][e- aiX -T\T e > t t ] 

(l - e~ QlT )P( min t > TT x t < 0|r T < T e ,x Q = T 



(23) 



Further, 



P ( min xt < OWt < r e , xq = T 

\ t>TT 

f T+1 ( | \ 

= / Pi min x t < \x TT ) dfx(x TT ) 

J rp t ^ TX 1 

= / PI mina; t < 0\xo]dp(xo) 



T+l 

P I min xt < min xt < T, xq 

rp \ t>0 ' t>0 

xP(minj; i < T\xr ) )dp{xo) 



(c) 



T+l 



p^ T (T)e- aoix °- T) dp{x ) 



IT 

E\e- ao{x 



-T) 



TT<T e ]p V (T), 



(24) 



where p denotes the conditional distribution of x TT given tt < 
T e . Note that x TT E [T, T+l] because the size of the overshoot 
is bounded by one. Further, (a) follows from stationarity of the 
arrival processes and the control policy, (b) holds because a 
necessary condition for the interruption event is to cross the 
threshold T when starting from a point a;o > T. Finally (c) 
follows from © and the definition of the risky policy. The 
relations d23l and d24l > together result in 



p* (T) 



e-^ T (l-E /i [e 



- ai {x TT -T) 



]) 



1 -E^[e- ao ^T- T )} + K 



(25) 



where k = E ll {e- aox ^-^- a °) T 
Therefore, using the fact that 

l-x<e~ x <l-x 



} - E^e"" 1 ^] > 0. 



— , for all x > 0, (26) 



we can provide the following bound 

e-^ T (a x E ll [x rT -T]) 



P* (T) < 



a E M [x TT -T](l 



a e /j[( 
2 



-T) 2 



E M [a; T -T] 



< 



ai 



-aiT 



a (l 



go \ 
2 I 



-QlT 



(27) 



where the last inequality holds since < x TD — D < 1. 

Now we prove feasibility of the risky policy ir T when D > 
D. Observe that by (f22]i, D > T* , hence the behavior of the 
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buffer size x t is the same as the one in a single-server system 
with rate Rq until the threshold T* is crossed. Thus 



p" (D) = P( miliar < 0\x = D 



P ( min Xt < min Xt < T* , xq = D 

\ t>o 1 t>o 



xPlmmxt < T*\x = D) 
x t>o ' 

= p - T ' (T*)er a ^ D ~ T ^ 

< p e -(ai-a a )T" -a a D _ £ 

where the inequality follows from d27l i. and the last equality 
holds by d22b . 

Next we verify the feasibility of the policy ir T for D < D. 
In this case, D < T* and by definition of the risky policy 
the system behaves as a single-server system with arrival 
rate R\ until the threshold T* is crossed or the buffer size 
hits zero (interruption). Hence, we can bound the interruption 
probability as follows 



^ (£))=P(r e <r T .)-l 
+P(r T , < r e )P 



min x t < \tt* 

t>T T , ' 



(a) 



1-P(t t . <r e ) 1-E M [ 



-O0(lr 



W (0-l)(l-e 



-aiD\ 



+ 1-/3(1 



< Te^O = 

-aiD\ 



(T*) 



(c) (£_ 



1)(1 



1 



-aiT* 



(./) 



where (a) follows from d24l i. (b) can be verified after some 
manipulations by combining the result of Lemma Q] and ( |25| >. 
and (c) holds since B > 1 and x Tt » > T*. Finally, (d) 
immediately follows from plugging in the definition of T* 
from d22l i. 

Therefore, the risky policy tt t is feasible by Definition |2] 
Observe that the buffer size under any policy ir T of the form 
d~L9b with T >T* stochastically dominates that of policy ir T , 
because n T switches to the costly server earlier, and stays in 
that state longer. Hence, ir T is feasible for all T > T* . ■ 

Theorem [5] facilitates the design of risky policies with a 
single-threshold structure, for any desired initial buffer size D 
and interruption probability e. For a fixed e, when D increases, 
T* (the design given by Theorem[5]) decreases to zero. On the 
other hand, if D decreases to log (i) (the boundary of 1Z), 
the threshold T* quickly increases to infinity, i.e., the policy 
does not switch back to the free server unless a sufficiently 
large number of packets is buffered. Figure Q] plots T* and D 
as a function of D for a fixed e. Observe that for large range of 
D, T* < D, i.e., the costly server is not initially used. In this 
range, owing to the positive drift of Q t , the probability of ever 
using the costly server exponentially decreases in (D — T*). 

Next we compute relatively tight bounds on the expected 
cost of the online risky policy and compare with the previously 
proposed policies. 

Theorem 6. For any (D,e) €E TZ, consider an online risky 
policy n T defined in Definition [4] where the threshold T* is 



8 

£ 40 




Fig. 1, The switching threshold of the online risky policy as a function of 
the initial buffer size for e = 10 -3 (See Theorem|5). 



(28) 



given by A22[ as function of D and e. If D > D then 



r (D,e)< 



and if D < D 

J* T *(D,e) < 



1 



-axD 



(i?i-l)(l-e-«i T *) 



3 -a (D-T*) 



D 



(29) 



where B = °* and D ~ — log ( £). 

Proof: Similarly to the proof of Theorem |5J we first 
consider the risky policy 7r T with the initial buffer size T. By 
definition of 7r T , the costly server is used until the threshold T 
is crossed. Thus the expected cost of this policy is bounded by 
the expected time until crossing the threshold plus the expected 
cost given that the threshold is crossed, i.e., 

E[x rr ] - T 



r (T,e) < 



+ E[e- ao{ - x *T- T) ]T' (T,e), 



Ri - 1 

where tt is defined in d20l i. The above relation implies 

1 E \x TT - T] 



r (T,e) < 



< 



< 



Ri - 1 1 - E[e 
1 



-ao(x Tq 
1 



-T)j 



Ri - 1 



1 



8 



a (R 1 - 1)(1 - ^) ax^-l)^ 303 

where the second inequality follows from the fact in fl26l i. Now 
for any D > D we can write 

/*(D,e) = P(minx t < T*|a;o = J"*" (T*,e) 

= e -*o(D-T')j* T " ( T * ;£ ) 

where the inequality holds by Theorem[T] Combining this with 
( f30b gives the result in ( |28] i. 

If D < D, the risky policy uses the costly server until the 
threshold T* is crossed at tt* or the interruption event (r e ), 
whichever happens first. Afterwards, no extra cost is incurred 
if an interruption has occurred. Otherwise, by (f30b an extra 



cost of at most 



ai(fll 



is incurred, i.e. 



r T (D,e) < E[min{r e ,T T ,}] + P(r T , < r e ) 
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By Doob's optional stopping theorem applied to the mar- 
tingale Z t — x t — — l)t, we obtain 

D = V{t t , < T e )-E[x TT ,\T T * <r e ]-(i? 1 -l)E[min{r e ,T T .}], 



which implies 



E[min{r e ,r T .}] < 



P(r T , < Te )(T* + l)-D 



Ri - 1 

By combining the preceding relations we conclude that 
which immediately implies ( |29l l by employing Lemma Q] 

■ 

In the following, we compare the expected cost of the 
presented policies using numerical methods, and illustrate that 
the bounds derived in Theorems [3] 2] and [6] on the expected 
cost function are close to the exact value. 

D. Performance Comparison 



- Off-line policy 

Online safe policy (analysis) 
Online safe policy (simulation) 
-Online risky policy (analysis) 

- Online risky policy (simulation) 




15 20 25 30 



40 45 50 55 60 65 70 

D 



Fig. 2. Expected cost (units of time) of the presented control policies as a 
function of the initial buffer size for interruption probability e = 10~ 3 . The 
analytical bounds are given by Theorems [5] \4\ and [6] 

Figure [2] compares the expected cost functions of the off- 
line, online safe and online risky policies as a function of 
the initial buffer size D, when the interruption probability is 
fixed to e = 10 ~ 3 , the arrival rate from the free server is 
Rq = 1.05, and the arrival rate from the costly server is R c = 
Ri — Rq = 0.15. We plot the bounds on the expected cost 
given by Theorems [3] E] and [6] as well as the expected cost 
function numerically computed by the Monte-Carlo method. 

Observe that the expected cost of the risky policy is sig- 
nificantly smaller that both online safe and off-line policies. 
For example, the risky policy allows us to decrease the initial 
buffer size from 70 to 20 with an average of70x0.15~10 
extra packets from the costly server. The expected cost in terms 
of the number packets received from the costly server is 43 
and 61 for the online safe and off-line policy, respectively. 

Moreover, note that it is merely the existence of the costly 
server as a backup that allows us to improve the user's quality 
of experience without actually using too many packets from 
the costly server. For example, observe that the risky policy 



satisfies QoE metrics of D = 35 and e = 10 -3 , by only 
using on average about one extra packet from the costly server. 
However, without the costly server, in order to decrease the 
initial buffer size from 70 to 35, the interruption probability 
has to increase from 10~ 3 to about 0.03 (see Theorem [TJ. 

V. Dynamic Programming Approach 

In this section, we present a characterization of the optimal 
association policy in terms of the Hamilton-Jacobi-Bellman 
(HJB) equation. Note that because of the probabilistic con- 
straint over the space of sample paths of the buffer size, 
the optimal policy is not necessarily Markov with respect to 
the buffer size as the state of the system. We take a similar 
approach as in [4| where by expanding the state space, a 
Bellman equation is provided as the optimality condition of an 
MDP with probabilistic constraint. In particular, consider the 
pair (x,p) as the state variable, where x denotes the buffer size 
and p represents the desired level of interruption probability. 
The evolution of x is governed by the following stochastic 
differential equation 



dx = -dt + dN u 1 xq = D, 



(31) 



where N u is a Poisson counter with rate R u = Rq + u ■ R c . 
For any (D,e) G 1Z and any optimal policy ir, the constraint 
p K (D) < e is active. Hence, we consider the sample paths of p 
such that po = e and E[pt] = e for all t, where the expectation 
is with respect to the Poisson jumps. Let p = p + dp if a 
Poisson jump occurs in an infinitesimal interval of length dt. 
Also, let dpo be the change in state p is no jump occurs. 
Therefore, 

= E[dp) = R u dt{p -p) + (1 - R u dt)d Po . 

By solving the above equation for dpo, we obtain the evolution 
of p as 

dp=(p-p)(R u dt~dN u ), ;po = e. (32) 

Similarly to the arguments of Theorem 2 of 0), by principle 
of optimality we can write the following dynamic program- 
ming equation 

V(x,p) = min \udt + 'E[V(x + dx,p + dp)]}. (33) 

ue{o,i},pe[o,i] 

If V is continuously differentiable, by It5's Lemma for jump 
processes, we have 

dV dV 
dV(x,p) = —(-dt) + — -(p-p)R u dt 

+ (V{x + l 1 p)-V(x,p))dN u , 

which implies the following HJB equation 



dV(x,p) 
dx 



dV 

min {u + — ■ (p - p)R n 
uG{o,i},pe[o,i] op 

+Ru(V(x + l,p)-V(x,p))} 



(34) 



The optimal policy ir is obtained by characterizing the 
optimal solution of the partial differential equation in (|34l 
together with the boundary condition V(x, 1) = 0. Since such 
equations are in general difficult to solve analytically, we use 
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the guess and check approach, where we propose a candidate 
for the value function and verify that it nearly satisfies the 
HJB equation almost everywhere. Moreover, we show that the 
trajectories of (xt,pt) steered by the optimal actions (u*,p*) 
lie in a one-dimensional invariant manifold, leading to the 
risky policy defined in Definition [4] 
For any (x,p) £ 7Z define 



T(x,p) 



a± —ctQ L ° \ p/ 

Ql ° V p — 



CtQX 



c )-l 



otherwise, 



(35) 

where 6 = The candidate solution for HJB equation (|34| | 
is given by 

V(x,p) = 



a (l-!f)(Rx-l) 



-a (x-T(x,p)) 



(36) 



when x > log (^), and 



V(x,p) 



P + 6(l 



(R 1 -i)(9^r {T{x ' p)+ ^ ) 



i?l - 1' 

(37) 

when x < ^-log(^). Note that the candidate solution is 
derived from the structure if the expected cost of the risky 
policy (cf. Theorem |6j. We may verify that V satisfies the 
HJB equation ( [34-b for all (x,p) such that x > log (|) or 
x > ^- log (|) — 1, but for other {x,p) the HJB equation 
is only approximately satisfied. This is due to bounding the 
overshoots, when computing the expected cost of the risky 
policy. The verification of HJB equation for our candidate 
solution is tedious but straightforward. We do not include it 
owing to space limitation. 

Theorem 7. Let tt* be the optimal association policy obtained 
from minimizing the right hand side of the HJB equation in 
(03 for the value function given by A36\l and ( 1571 ). Then ir* 
is a risky policy defined in Definition [4] with a threshold level 
T(D, e), where D is the initial buffer size and e is the desired 
interruption probability. 

Proof: We sketch the proof owing to space limitation. 
First, we can show that the optimal action u*(x,p) takes the 
following form 



u*{x,p) 



0, if*>iiog(f; 

1, otherwise. 



(38) 



Moreover, we may verify that for the initial condition 
(xq,po) = (D 7 e), the trajectory of {x t ,Pt) steered by the op- 
timal actions (u* ,p*) is limited to a one-dimensional invariant 
manifold M(D, e), where 



= {(x,p) :p = 9e- a °*-^- a °) T > i 



(6 - l) e -^nD,e) _ e -ai*(1 _ g e - ai T(D,e) 



x<T(D,e)} j, 



l _ e -a 1 T(D,e) 

where T(D, e) is given by J351 >. Therefore, by plugging the 
above relation back into (|38l , we can show that the optimal 
action u* = if and only if x > T(D, e), i.e., the optimal 
policy given by the HJB equation is of the form of the risky 
policy in Definition g] with threshold T = T(D, e). ■ 



VI. Conclusions and Future Work 

In this paper we studied the problem of selecting the access- 
networks in a heterogeneous wireless environment for media 
streaming applications. Our objective was to investigate the 
trade-offs between the network usage cost and the user's 
QoE requirements parameterized by initial waiting time and 
allowable probability of interruption in media playback. We 
analytically characterized and compared the expected cost 
of both off-line and online policies, finally showing that a 
threshold-based onilne risky policy achieves the lowest cost. 
Moreover, we derived an HJB equation for the problem of 
finding the optimal deterministic policy formulated as an MDP 
with a probabilistic constraint, and verified that the the online 
risky policy nearly satisfies the HJB equation. Numerical 
analysis also confirmed our analytical results showing that 
merely the availability of a costly server used as a back- 
up significantly improves QoE of media streaming without 
incurring a significant usage cost. 

In the future, we would like to study more accurate models 
of channel variations such as the two-state Markov model due 
to Gillbert and Elliot. In this work we focused on deterministic 
network association policies. Another extension of this work 
would consist of studying randomized control policies. Finally, 
we would like to study more of the peer-to-peer aspect of the 
system to understand the decision making at the system level. 

References 

[1] C. Labovitz, D. McPherson, and S. Iekel-Johnson. 2009 Internet 
Observatory report. In NANOG-47, October 2009. 

[2] R. Laroia. Future of Wireless? The Proximate Internet. In Proc. of 
the Second International Conference on Communication Systems and 
Networks (COMSNETS), Bangalore, India, January 2010. 

[3] Knocking, http://knockinglive.com, 2010. 

[4] R. Chen. Constrained stochastic control with probabilistic criteria and 

search optimization. In Proc. 43rd IEEE Conference on Decision and 

Control, December 2004. 
[5] Y. P. Zhou, D. M. Chiu, and J. C. S. Lui. A simple model for analyzing 

P2P streaming protocols. In Proc. IEEE ICNP 2007. 
[6] T. Bonald, L. Massoulie, F. Mathieu, D. Perino, and A. Twigg. Epidemic 

live streaming: optimal performance trade-offs. SIGMETRICS Perform. 

Eval. Rev., 36(l):325-336, 2008. 
[7] L. Ying, R. Srikant, and S. Shakkottai. The Asymptotic Behavior of 

Minimum Buffer Size Requirements in Large P2P Streaming Networks. 

In Proc. of the Information Theory and Applications Workshop (to 

appear), San Diego, CA, February 2010. 
[8] S. Acedanski, S. Deb, M. Medard, and R. Koetter. How good is random 

linear coding based distributed networked storage. In NetCod, 2005. 
[9] C. Gkantsidis, J. Miller, and P. Rodriguez. Comprehensive view of a 

live network coding p2p system. In Proc. ACM SIGCOMM, 2006. 
[10] M. Wang and B. Li. R 2 : Random push with random network coding in 

live peer-to-peer streaming. IEEE JSAC, Special Issue on Advances in 

Peer-to-Peer Streaming Systems, 25:1655-1666, 2007. 
[11] Ft. Chi and Q. Zhang. Deadline-aware network coding for video on 

demand service over p2p networks. In PacketVideo, 2006. 
[12] D. Kumar, E. Altaian, and J-M. Kelif. Globally Optimal User-Network 

Association in an 802.11 WLAN and 3G UMTS Hybrid Cell. In Proc. 

of the 20th International Teletraffic Congress (ITC-20), Ottawa, Canada, 

June 2007. 

[13] C. Feng and B. Li. On large-scale peer-to-peer streaming systems 
with network coding. In Proceedings of the 16th ACM international 
conference on Multimedia, Vancouver, Canada, October 2008. 

[14] A. ParandehGheibi, M. Medard, S. Shakkottai, and A. Ozdaglar. Avoid- 
ing interruptions - QoE trade-offs in block-coded streaming media 
applications, submitted to ISIT 2010, arXiv:1001.1937 [cs.MM]. 

[15] I. Karatzas and S. Shreve. Brownian Motion and Stochastic Calculus. 
Springer, 1997. 



